pacman::p_load(sf, dplyr, ggplot2, knitr, mapview, spdep, tmap, tidyverse)Dynamic Patterns of Public Transport Usage
A Geospatial Analysis of Bus Passenger Volume in Urban Environments
1.1 Overview
In the intricate mosaic of urban transportation, bus stops serve as pivotal nodes that capture the pulse of city life through the ebb and flow of passenger trips. The study of passenger trip volume, particularly during peak hours, becomes essential for enhancing service efficiency, planning urban infrastructure, and improving the overall commuter experience.
This geospatial analysis is anchored in the bustling landscape of a metropolitan area, where data on bus stops, population distribution, urban development plans, and passenger volume intertwine to paint a comprehensive picture of transit dynamics.
This analysis aims to dissect the rhythms of urban mobility in Singapore through the following analysis:
Geovisualization and Analysis
- Compute the passenger trips
- Visualize the geographical distribution of the passenger trips
Local Indicators of Spatial Association (LISA) Analysis
- Compute LISA of the passengers trips
- Visualize the LISA maps of the passengers trips
Emerging Hot Spot Analysis (EHSA)
- Perform Mann-Kendall Test by using the spatio-temporal local Gi* values
- Visualize EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level
1.2 Load packages
The following packages will be useful
sfimports and handles geospatial datatidyverseperforms aspatial data import, wrangling and visualizationdplyrperform relational joinspdepcompute spatial weights and spatially lagged variablesspfepcompute spatial autocorrelationggplot2,mapviewandtmapsupports data visualisation
1.3 Import data
We will be using the following geospatial (busstops, mpsz) and aspatial (odbus, pop) datasets.
st_geometry() displays basic information of the feature class, such as type of geometry, the geographic extent of the features and the coordinate system of the data. glimpse() transposes the columns in a dataset and makes it possible to see the column name, data type and values in every column in a data frame.
busstops contains the detailed information for all bus stops currently serviced by buses, including bus stop code, road name, description, location coordinates.
Source: LTA DataMall (Postman URL)
Show the code
busstops <- st_read(dsn = "data/BusStopLocation_Jul2023", layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/BusStopLocation_Jul2023'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Show the code
st_geometry(busstops)Geometry set for 5161 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
The output indicates that the geospatial objects are point features. There are 5161 features and 3 fields. It is in SVY21 projected coordinates system with XY dimension.
mpsz is the Master Plan 2019, a forward looking guiding plan for Singapore’s development in the medium term over the next 10 to 15 years published in 2019. Note this mpsz differs from that in previous chapter, Data Wrangling.
Source: URA (Download here)
Show the code
mpsz = st_read(dsn="data/MPSZ-2019", layer="MPSZ-2019") %>%
st_transform(crs=3414)Reading layer `MPSZ-2019' from data source
`/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/MPSZ-2019'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
The output indicates that the geospatial objects are multipolygon features. There are 332 features and 6 fields. It is in WGS84 projected coordinates system with XY dimension.
odbus contains the number of trips by weekdays and weekends from origin to destination bus stops. It reflects the passenger trip traffic and the most recent dataset from October 2023 will be used.
Source: LTA DataMall (Postman URL)
Show the code
odbus = read_csv("data/origin_destination_bus_202310.csv")Show the code
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)
glimpse(odbus)Rows: 5,694,297
Columns: 7
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <fct> 04168, 04168, 80119, 80119, 44069, 20281, 20281, 1…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 20141, 20141, 1…
$ TOTAL_TRIPS <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
The output indicates 5,694,297 records and 7 fields. The bus stop codes are converted into factor for data handling.
pop contains Singapore residents grouped by planning area or subzone, age group, sex and floor area of residence. The data period is from June 2011 onwards. From June 2011 to 2019, the planning areas refer to areas demarcated in the Master Plan 2014, and from June 2020 onwards will be Master Plan 2019.
Source: Department of Statistics (Link)
Show the code
pop <- read_csv("data/respopagesexfa2011to2020/respopagesexfa2011to2020.csv")Show the code
glimpse(pop)Rows: 738,492
Columns: 7
$ PA <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo K…
$ SZ <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Kio T…
$ AG <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to…
$ Sex <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Females", …
$ FA <chr> "<= 60", ">60 to 80", ">80 to 100", ">100 to 120", ">120", "Not A…
$ Pop <dbl> 0, 10, 30, 80, 20, 0, 0, 10, 40, 90, 10, 0, 0, 10, 30, 110, 30, 0…
$ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
The output indicates 738,492 records and 7 fields to illustrate the population distribution by planning area (PA), subzone (SZ), age group (AG), residence floor area (FA), resident count (Pop).
1.4 Create Spatial Grids
Spatial grids are commonly used in spatial analysis to divide the study area into equal size, regular polygons that tessellate the area of interest. Square grid rarely reflect real world situations, whereas hexagons are compact and can overcome oddly-shaped geographical units.
hexagon is a hexagon layer of 250m where the distance is the perpendicular distance between the centre of the hexagon and its edges. It is a substitute for multipolygon in mpsz which is relatively coarse and irregular.
Step 1
First, we will create hexagonal grids using st_make_grid() to cover the geometry of busstops. We also assign grid id to each hexagon using length() to calculate the length of the geometry and generate a sequence from 1 to the length of vectors in area_hexagon_grid.
Show the code
area_hexagon_grid = st_make_grid(busstops, cellsize = 500, what = "polygons", square = FALSE)
hexagon_grid_sf = st_sf(area_hexagon_grid) %>%
mutate(grid_id = 1:length(lengths(area_hexagon_grid)))
hexagon_grid_sfSimple feature collection with 5580 features and 1 field
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
area_hexagon_grid grid_id
1 POLYGON ((3720.122 26626.44... 1
2 POLYGON ((3720.122 27492.46... 2
3 POLYGON ((3720.122 28358.49... 3
4 POLYGON ((3720.122 29224.51... 4
5 POLYGON ((3720.122 30090.54... 5
6 POLYGON ((3720.122 30956.57... 6
7 POLYGON ((3720.122 31822.59... 7
8 POLYGON ((3720.122 32688.62... 8
9 POLYGON ((3720.122 33554.64... 9
10 POLYGON ((3720.122 34420.67... 10
The output indicates that the geospatial objects are polygon features. There are 5580 features and 1 fields. It is in SVY21 projected coordinates system with XY dimension, same as that in busstops dataset.
Step 2
Second, st_intersects() combines the hexagon object hexagon_grid_sf and busstops and length() counts the number of bus stops in each grid. The grids without any bus stops will be removed by filter() to reflect the distribution in reality.
Show the code
hexagon_grid_sf$grid_id = lengths(st_intersects(hexagon_grid_sf, busstops))
hexagon_count = filter(hexagon_grid_sf, grid_id > 0)
hexagon_countSimple feature collection with 1524 features and 1 field
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
area_hexagon_grid grid_id
1 POLYGON ((3970.122 27925.48... 1
2 POLYGON ((4220.122 28358.49... 1
3 POLYGON ((4470.122 30523.55... 1
4 POLYGON ((4720.122 28358.49... 1
5 POLYGON ((4720.122 30090.54... 2
6 POLYGON ((4720.122 30956.57... 1
7 POLYGON ((4720.122 31822.59... 1
8 POLYGON ((4970.122 28791.5,... 1
9 POLYGON ((4970.122 29657.53... 1
10 POLYGON ((4970.122 30523.55... 2
The output indicates that the geospatial objects retained polygon features and there are more than one polygon feature in a grid_id. The intersection of busstops and hexagon_grid_sf yields 1524 features and 1 fields, indicating only 1524 out of 5580 features contains bus stops. It is in SVY21 projected coordinates system with XY dimension.
Step 3
Using tmap, the distribution of bus stops across Singapore can be visualised in the interactive map below.
Show the code
tmap_mode("view")
hexagon = tm_shape(hexagon_count)+
tm_fill(
col = "grid_id",
palette = "Reds",
style = "cont",
title = "Number of Bus Stops in Singapore",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.format = list(
grid_id = list(format = "f", digits = 0)
)
)+
tm_borders(col = "grey40", lwd = 0.7)
hexagonThe geospatial distribution of bus stops in Singapore, as visualized in the above map, is extensive and dispersed throughout the central region, with a notable concentration of stops with darker shades of red signifying higher concentrations. Outlying regions, along the coastal areas, show fewer bus stops as indicated by the presence of lighter-colored hexagons or even white spaces where no stops are present. This distribution pattern suggests a robust public transportation infrastructure in urban and densely populated areas, tapering off in less populated or industrial regions.
1.5 Explore data
mapview creates interactive visualisations of spatial data to examine and visually investigate both aspects of spatial data, the geometries and their attributes.
plot() takes parameters for specifying points in the diagram. At its simplest, it can plot two numbers against each other. With datasets, it can generate maps and plot the specified columns/attributes, with default up to nine plots or maximum all plots.
mapview(busstops, cex = 3, alpha = 0.5, popup = NULL)# Filter hexagons that contain more than 8 bus stops
hexagon_red = filter(hexagon_grid_sf, grid_id>8)
tmap_mode("view")
redhex = tm_shape(hexagon_red)+
tm_fill(
col = "grid_id",
palette = "Reds",
style = "cont",
title = "Number of Bus Stops",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.format = list(
grid_id = list(format = "f", digits = 0)
)
)+
tm_borders(col = "grey40", lwd = 0.7)
redhexSembawang MRT (9 bus stops) 
Pasir Ris (11 bus stops) 
# Summarize the total trips by day type
total_trips <- odbus %>%
group_by(DAY_TYPE) %>%
summarise(total = sum(TOTAL_TRIPS))
# Plot using ggplot2
ggplot(total_trips, aes(x = DAY_TYPE, y = total, fill = DAY_TYPE)) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(title = "Total Trips by Day Type", x = "Day Type", y = "Total Trips")
plot(mpsz["PLN_AREA_N"])
1.6 Geovisualisation and Analysis
1.6.2 Compute the passenger trips generated by origin
Step 1: Classify by time interval
# Function to assign peak and non-peak times
time_interval <- function(day_type, time_per_hour) {
if (day_type == "WEEKDAY") {
if (time_per_hour >= 6 & time_per_hour <= 9) {
"morning peak"
} else if (time_per_hour >= 17 & time_per_hour <= 20) {
"afternoon peak"
} else {
"non peak"
}
} else if (day_type == "WEEKENDS/HOLIDAY") {
if (time_per_hour >= 11 & time_per_hour <= 14) {
"morning peak"
} else if (time_per_hour >= 16 & time_per_hour <= 19) {
"evening peak"
} else {
"non peak"
}
} else {
"non peak"
}
}
# Assuming 'odbus' is your data frame
odbus$TIME <- mapply(time_interval, odbus$DAY_TYPE, odbus$TIME_PER_HOUR)
# Checking the first few rows of the data frame to verify the new column
head(odbus)# A tibble: 6 × 8
YEAR_MONTH DAY_TYPE TIME_PER_H…¹ PT_TYPE ORIGI…² DESTI…³ TOTAL…⁴ TIME
<chr> <chr> <dbl> <chr> <fct> <fct> <dbl> <chr>
1 2023-10 WEEKENDS/HOLIDAY 16 BUS 04168 10051 3 even…
2 2023-10 WEEKDAY 16 BUS 04168 10051 5 non …
3 2023-10 WEEKENDS/HOLIDAY 14 BUS 80119 90079 3 morn…
4 2023-10 WEEKDAY 14 BUS 80119 90079 5 non …
5 2023-10 WEEKDAY 17 BUS 44069 17229 4 afte…
6 2023-10 WEEKENDS/HOLIDAY 17 BUS 20281 20141 1 even…
# … with abbreviated variable names ¹TIME_PER_HOUR, ²ORIGIN_PT_CODE,
# ³DESTINATION_PT_CODE, ⁴TOTAL_TRIPS
Step 2: Join datasets
passengertrips <- left_join(busstops, odbus,
by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))passengertrips_day <- passengertrips %>%
group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
summarise(
WEEKDAY_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY"], na.rm = TRUE),
WEEKENDS_HOLIDAYS_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY"], na.rm = TRUE),
.groups = "drop"
)Determine peak or non peak
passengertrips_time <- passengertrips %>%
group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
summarise(
WEEKDAY_MORNING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "morning peak"], na.rm = TRUE),
WEEKDAY_AFTERNOON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "afternoon peak"], na.rm = TRUE),
WEEKDAY_NON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "non peak"], na.rm = TRUE),
WEEKENDS_HOLIDAYS_MORNING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "morning peak"], na.rm = TRUE),
WEEKENDS_HOLIDAYS_EVENING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "evening peak"], na.rm = TRUE),
WEEKENDS_HOLIDAYS_NON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "non peak"], na.rm = TRUE),
.groups = "drop"
)
passengertrips_time <- passengertrips_time %>%
filter(!(WEEKDAY_MORNING_PEAK == 0
& WEEKDAY_AFTERNOON_PEAK == 0
& WEEKDAY_NON_PEAK == 0
& WEEKENDS_HOLIDAYS_MORNING_PEAK == 0
& WEEKENDS_HOLIDAYS_EVENING_PEAK == 0
& WEEKENDS_HOLIDAYS_NON_PEAK == 0))Step 3: Ensure both datasets are in the same coordinate reference system (CRS).
st_crs(passengertrips_day)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(passengertrips_time)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(hexagon_grid_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Step 4: Perform a spatial join to match trips to hexagons
passengergrid_day <- st_join(hexagon_grid_sf, passengertrips_day, join = st_intersects)# Remove rows with no trips (NA values)
passengergrid_day <- passengergrid_day %>%
filter(!is.na(BUS_STOP_N))
glimpse(passengergrid_day)Rows: 5,161
Columns: 8
$ grid_id <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1…
$ BUS_STOP_N <chr> "25059", "25751", "26379", "25761", "25719", "…
$ BUS_ROOF_N <chr> "UNK", "B02D", "NIL", "B03", "B01C", "NIL", "N…
$ LOC_DESC <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE 14", "Y…
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2…
$ WEEKDAY_TRIPS <dbl> 526, 468, 1055, 3420, 4584, 944, 689, 768, 644…
$ WEEKENDS_HOLIDAYS_TRIPS <dbl> 105, 96, 247, 816, 1688, 379, 248, 147, 108, 3…
$ area_hexagon_grid <POLYGON [m]> POLYGON ((3970.122 27925.48..., POLYGO…
passengergrid_time <- st_join(hexagon_grid_sf, passengertrips_time, join = st_intersects)# Remove rows with no trips (NA values)
passengergrid_time <- passengergrid_time %>%
filter(!is.na(BUS_STOP_N))
glimpse(passengergrid_time)Rows: 5,029
Columns: 12
$ grid_id <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, …
$ BUS_STOP_N <chr> "25059", "25751", "26379", "25761", "25…
$ BUS_ROOF_N <chr> "UNK", "B02D", "NIL", "B03", "B01C", "N…
$ LOC_DESC <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE …
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-…
$ WEEKDAY_MORNING_PEAK <dbl> 103, 52, 78, 185, 815, 301, 53, 60, 64,…
$ WEEKDAY_AFTERNOON_PEAK <dbl> 390, 114, 291, 1905, 2600, 299, 241, 36…
$ WEEKDAY_NON_PEAK <dbl> 33, 302, 686, 1330, 1169, 344, 395, 340…
$ WEEKENDS_HOLIDAYS_MORNING_PEAK <dbl> 0, 26, 52, 187, 367, 88, 76, 45, 21, 39…
$ WEEKENDS_HOLIDAYS_EVENING_PEAK <dbl> 56, 14, 100, 346, 533, 101, 55, 49, 53,…
$ WEEKENDS_HOLIDAYS_NON_PEAK <dbl> 49, 56, 95, 283, 788, 190, 117, 53, 34,…
$ area_hexagon_grid <POLYGON [m]> POLYGON ((3970.122 27925.48...,…
Step 5: Split passengers trip into weekday and weekend
::: panel-tabset #### Day
# Subset for Weekday
weekday_trips <- passengergrid_day %>%
group_by(BUS_STOP_N) %>%
summarise(
weekday_trips = sum(WEEKDAY_TRIPS, na.rm = TRUE),
)
# Subset for Weekend
weekend_trips <- passengergrid_day %>%
group_by(BUS_STOP_N) %>%
summarise(
weekend_trips = sum(WEEKENDS_HOLIDAYS_TRIPS, na.rm = TRUE),
)Time
# First, ensure all necessary columns are present in the dataframe
passengergrid_clean <- passengergrid_time %>%
mutate(
WEEKDAY_MORNING_PEAK = ifelse(is.na(WEEKDAY_MORNING_PEAK), 0, WEEKDAY_MORNING_PEAK),
WEEKDAY_AFTERNOON_PEAK = ifelse(is.na(WEEKDAY_AFTERNOON_PEAK), 0, WEEKDAY_AFTERNOON_PEAK),
WEEKENDS_HOLIDAYS_MORNING_PEAK = ifelse(is.na(WEEKENDS_HOLIDAYS_MORNING_PEAK), 0, WEEKENDS_HOLIDAYS_MORNING_PEAK),
WEEKENDS_HOLIDAYS_EVENING_PEAK = ifelse(is.na(WEEKENDS_HOLIDAYS_EVENING_PEAK), 0, WEEKENDS_HOLIDAYS_EVENING_PEAK)
)
# Function to summarise and filter bus stops with no trips
summarise_and_filter <- function(data, column) {
data %>%
group_by(BUS_STOP_N) %>%
summarise(Total_Trips = sum({{ column }}, na.rm = TRUE)) %>%
filter(Total_Trips > 0) %>%
ungroup()
}
# Create subsets using the function
weekday_morning_peak <- summarise_and_filter(passengergrid_clean,
WEEKDAY_MORNING_PEAK)
weekday_afternoon_peak <- summarise_and_filter(passengergrid_clean,
WEEKDAY_AFTERNOON_PEAK)
weekend_morning_peak <- summarise_and_filter(passengergrid_clean,
WEEKENDS_HOLIDAYS_MORNING_PEAK)
weekend_evening_peak <- summarise_and_filter(passengergrid_clean,
WEEKENDS_HOLIDAYS_EVENING_PEAK)
# Check for any BUS_STOP_N that might still have issues
problematic_stops <- passengergrid_clean %>%
filter(is.na(BUS_STOP_N)) %>%
pull(BUS_STOP_N) %>%
unique()
# If problematic_stops has any values, you may need to address these specifically,
# for example by removing them from passengergrid_clean before creating the subsets
if (length(problematic_stops) > 0) {
passengergrid_clean <- passengergrid_clean %>%
filter(!(BUS_STOP_N %in% problematic_stops))
}Step 6: Plot passenger trips traffic
# Convert your data to an sf object if it's not one already
weekday_sf <- st_as_sf(weekday_trips, coords = c("longitude", "latitude"), crs = 4326)
# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_sf$weekday_trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)
# Print the map
tmap_mode("view")
tm_shape(weekday_sf) +
tm_polygons("weekday_trips",
palette = "Blues",
border.col = "grey40",
breaks = breaks,
title = "Weekday Trips") +
tm_scale_bar()+
tm_layout(main.title = "Weekday Trips by Bus Stop",
main.title.position = "center",
frame = FALSE)# Convert your data to an sf object if it's not one already
weekday_morning_peak_sf <- st_as_sf(weekday_morning_peak, coords = c("longitude", "latitude"), crs = 4326)
# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_morning_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)
# Print the map
tmap_mode("view")
tm_shape(weekday_morning_peak_sf) +
tm_polygons("Total_Trips",
palette = "Blues",
border.col = "grey40",
breaks = breaks,
title = "weekday_morning_peak Trips") +
tm_scale_bar()+
tm_layout(main.title = "weekday_morning_peak Trips by Bus Stop",
main.title.position = "center",
frame = FALSE)# Convert your data to an sf object if it's not one already
weekday_afternoon_peak_sf <- st_as_sf(weekday_afternoon_peak, coords = c("longitude", "latitude"), crs = 4326)
# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_afternoon_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)
# Print the map
tmap_mode("view")
tm_shape(weekday_afternoon_peak_sf) +
tm_polygons("Total_Trips",
palette = "Blues",
border.col = "grey40",
breaks = breaks,
title = "Weekday Afternoon Peak Trips") +
tm_scale_bar()+
tm_layout(main.title = "Weekday Afternoon Peak Trips by Bus Stop",
main.title.position = "center",
frame = FALSE)# Convert your data to an sf object if it's not one already
weekend_sf <- st_as_sf(weekend_trips, coords = c("longitude", "latitude"), crs = 4326)
# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_sf$weekend_trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)
# Print the map
tmap_mode("view")
tm_shape(weekend_sf) +
tm_polygons("weekend_trips",
palette = "Purples",
border.col = "grey40",
breaks = breaks,
title = "Weekend Trips") +
tm_scale_bar()+
tm_layout(main.title = "Weekend Trips by Bus Stop",
main.title.position = "center",
frame = FALSE)# Convert your data to an sf object if it's not one already
weekend_morning_peak_sf <- st_as_sf(weekend_morning_peak, coords = c("longitude", "latitude"), crs = 4326)
# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_morning_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)
# Print the map
tmap_mode("view")
tm_shape(weekend_morning_peak_sf) +
tm_polygons("Total_Trips",
palette = "Purples",
border.col = "grey40",
breaks = breaks,
title = "Weekend Morning Peak Trips") +
tm_scale_bar()+
tm_layout(main.title = "Weekend Morning Peak Trips by Bus Stop",
main.title.position = "center",
frame = FALSE)# Convert your data to an sf object if it's not one already
weekend_evening_peak_sf <- st_as_sf(weekend_evening_peak, coords = c("longitude", "latitude"), crs = 4326)
# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_evening_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)
# Print the map
tmap_mode("view")
tm_shape(weekend_evening_peak_sf) +
tm_polygons("Total_Trips",
palette = "Purples",
border.col = "grey40",
breaks = breaks,
title = "Weekend Evening Peak Trips") +
tm_scale_bar()+
tm_layout(main.title = "Weekend Evening Peak Trips by Bus Stop",
main.title.position = "center",
frame = FALSE)1.6 Local Indicators of Spatial Association (LISA) Analysis
1.6.1 Compute LISA of the passengers trips generate by origin at hexagon level.
Method 1: Computing Contiguity Spatial Weights
poly2nb defines the spatial relationship between different regions by computing contiguity weight matrices and build a neighbours list based on regions with contiguous boundaries.
passengergrid_day_total <- passengergrid_day %>%
mutate(TOTAL_TRIPS = WEEKDAY_TRIPS + WEEKENDS_HOLIDAYS_TRIPS)wm_q_day <- poly2nb(passengergrid_day_total, queen=TRUE)
summary(wm_q_day)Neighbour list object:
Number of regions: 5161
Number of nonzero links: 112788
Percentage nonzero weights: 0.4234432
Average number of links: 21.8539
7 regions with no links:
1767 2407 2884 3350 4627 5154 5222
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
7 12 17 27 36 89 67 103 85 71 125 135 105 140 150 139 189 191 222 153
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
186 161 244 261 220 212 200 205 196 173 117 128 135 117 114 69 62 49 34 50
40 41 42 43 44 45 46 49
46 52 11 25 13 6 4 8
12 least connected regions:
34 251 313 1543 3507 3507.1 5129 5159 5278 5278.1 5558 5558.1 with 1 link
8 most connected regions:
3292 3292.1 3292.2 3292.3 3292.4 3292.5 3292.6 3292.7 with 49 links
The output shown is a summary of a neighbors list object using Queen contiguity weight matrix. There are 5,029 regions and approximately 42.69% of all possible neighbor pairs are neighbors with nonzero links. Each region has an average of 21.47 neighboring regions. For poorly connected regions, there are 7 regions without neighbors and 14 regions with 1 neighbour. There are 8 well-connected regions with 48 neighbours.
passengergrid_time_total <- passengergrid_time %>%
mutate(TOTAL_TRIPS = WEEKDAY_MORNING_PEAK + WEEKDAY_AFTERNOON_PEAK + WEEKDAY_NON_PEAK +
WEEKENDS_HOLIDAYS_MORNING_PEAK + WEEKENDS_HOLIDAYS_EVENING_PEAK + WEEKENDS_HOLIDAYS_NON_PEAK)wm_q_time <- poly2nb(passengergrid_time_total, queen=TRUE)
summary(wm_q_time)Neighbour list object:
Number of regions: 5029
Number of nonzero links: 107980
Percentage nonzero weights: 0.426953
Average number of links: 21.47147
7 regions with no links:
1767 2407 2884 3350 4627 5154 5222
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
7 14 17 30 28 94 67 103 82 81 116 137 122 137 130 159 166 230 197 163
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
157 177 257 237 209 197 216 217 181 185 127 128 93 121 97 47 62 48 9 27
40 41 42 43 44 48
56 47 17 19 10 8
14 least connected regions:
34 251 313 1543 2135 2135.1 3507 3507.1 5129 5159 5278 5278.1 5558 5558.1 with 1 link
8 most connected regions:
3292 3292.1 3292.2 3292.3 3292.4 3292.5 3292.6 3292.7 with 48 links
The output shown is a summary of a neighbors list object. There are 5,029 regions and approximately 42.69% of all possible neighbor pairs are neighbors with nonzero links. Each region has an average of 21.47 neighboring regions. For poorly connected regions, there are 7 regions without neighbors and 14 regions with 1 neighbour. There are 8 well-connected regions with 48 neighbours.
Method 2: Computing distance based neighbours
passengergrid is made up of polygons features, so we will need to get points in order to make our connectivity graphs. The most typically method for this will be polygon centroids.
longitude <- map_dbl(passengergrid_time_total$area_hexagon_grid, ~st_centroid(.x)[[1]])
latitude <- map_dbl(passengergrid_time_total$area_hexagon_grid, ~st_centroid(.x)[[2]])coords <- cbind(longitude, latitude)
head(coords) longitude latitude
[1,] 3970.122 28214.15
[2,] 4220.122 28647.16
[3,] 4470.122 30812.23
[4,] 4720.122 28647.16
[5,] 4720.122 30379.22
[6,] 4720.122 30379.22
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.0 0.0 402.8 0.0 16228.8
The summary report shows that the largest first nearest neighbour distance is 16228.8km, so using this as the upper threshold gives certainty that all units will have at least one neighbour.
wm_d16229 <- dnearneigh(coords, 0, 16229, longlat = TRUE)
wm_d16229Neighbour list object:
Number of regions: 5029
Number of nonzero links: 22604344
Percentage nonzero weights: 89.37759
Average number of links: 4494.799
# Make a new list with only the first 5 elements
wm_d16229_subset <- wm_d16229[1:5]
# Now use str() on this subset
str(wm_d16229_subset)List of 5
$ : int [1:4468] 2 4 5 6 8 9 10 11 12 13 ...
$ : int [1:4679] 1 3 4 5 6 7 8 9 10 11 ...
$ : int [1:4681] 2 4 5 6 7 8 9 10 11 12 ...
$ : int [1:4703] 1 2 3 5 6 7 8 9 10 11 ...
$ : int [1:4562] 1 2 3 4 6 7 8 9 10 11 ...
derive a spatial weight matrix based on Inversed Distance method.
#dist_day <- nbdists(wm_q_day, coords, longlat = TRUE)
#ids_day <- lapply(dist, function(x) 1/(x))
#ids_day